Model Selection

Multimodal speech understanding

# Multimodal speech understanding

Ultravox V0 6 Qwen 3 32b

Ultravox is a large multimodal speech language model capable of understanding and processing speech input, supporting multiple languages and noisy environments.

Transformers Supports Multiple Languages

Ultravox V0 6 Gemma 3 27b

Ultravox is a multimodal large speech language model that can process both speech and text inputs simultaneously, providing strong support for speech interaction scenarios.

Transformers Supports Multiple Languages

Ultravox V0 6 Llama 3 3 70b

Ultravox is a large multimodal speech language model that combines a pre-trained large language model and a speech encoder, capable of handling both speech and text inputs.

Transformers Supports Multiple Languages

Ichigo Llama3.1 S Instruct V0.3 Phase 2

The Ichigo-llama3s series models natively support audio and text input comprehension, based on the Llama-3 architecture, using WhisperVQ as the tokenizer for audio files.

Text-to-Audio English

SpeechLLM is a multimodal large language model trained to predict speaker turn metadata in conversations, including speech activity, transcribed text, speaker gender, age, accent, and emotion.

Transformers English

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase